-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update Simpleaf modules, subworkflow #424
base: dev
Are you sure you want to change the base?
Conversation
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.1.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor things
Hi @grst , I think I am pretty happy with the code now. Interestingly, although I did not touch the code for other aligners, all CI tests except those for simpleaf failed. I tested my changes locally and everything worked. We can discuss linting and the output structure now. The current output layout is as the following. The biggest change is I removed the
Please let me know what you think! We are getting close! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few final minor things. Happy to merge once those are addressed!
Also, please update the CHANGELOG :)
if "gene_symbol" in adata.var.columns: | ||
adata.var['gene_ids'] = adata.var['gene_symbol'] | ||
else: | ||
adata.var['gene_ids'] = adata.var['gene_id'] | ||
|
||
adata.var['gene_versions'] = adata.var['gene_ids'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how the anndata generated by simpleaf looks like, so just commenting to be sure we are on the same page. For consistency across all aligners, in scrnaseq
, we expect
adata.var_names
are always (ensembl) gene ids without version suffix.adata.var["gene_symbol"]
contains human-readable gene symbols/names. They don't need to be uniqueadata.var["gene_versions"]
may contain ENSG IDs including the gene version.
adata.var
can contain arbitrary other columns, but I'd avoid redundancies. E.g. we don't need gene_id
and gene_ids
and the same in the index, just get rid of the redundant columns in that case.
I think I addressed all your comments. As this is my first PR to scrnaseq and I made many major changes, before we merge it, can we invite more reviewers? It will be great if other maintainers can go through the changes, especially the document part. Thanks, |
Thanks for the updates! |
I cannot promise, but I can try to find some time to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there,
Thanks for the work on it.
I have added a few sincere questions and some comments for changes (if y'all agree) :)
tests/main_pipeline_alevin.nf.test
Outdated
{assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_X/simpleaf_quant/af_quant/alevin/quants_mat.mtx" ).exists()}, | ||
{assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_X/simpleaf_quant/af_quant/alevin/quants.h5ad" ).exists()}, | ||
{assert new File( "${outputDir}/results_simpleaf/simpleaf/mtx_conversions/Sample_Y/Sample_Y_raw_matrix.h5ad" ).exists()}, | ||
{assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_Y/simpleaf_quant/af_quant/alevin/quants_mat_cols.txt" ).exists()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is any of these alevin files now possible to have as snaps with the new version?
I remember before they could vary the sorting but not sure about the newest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For mtx and its column and row names, unfortunately they can still vary because of parallelization. For the h5ad file, what I can do is sorting it in the mtx_to_h5ad
module to make the order fixed. Do you think it is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Generally, I do not think is necessary. But, if it would be possible to have it being part of the snaps, it would surely add robustness.
@grst , do you think this should be here or is the PR already big enough and better to have another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean having a snapshot is obviously better than not having one, but I won't insist on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job in general - just a few remarks :)
OK I think I have address all comments but two: For 1, I can add two parameters and implement the logic, not a big deal. For 2, I realized that I did not change the test file name so simpleaf was not tested. I ran tests locally, everything worked well, but a weird bug jumped out in GitHub Actions: Local test log
GitHub Actions workflow error: I could not reproduce this error locally. Any suggestions how I can address it? Could you download the artifact of the failed job so that I can jump into it? |
So it turns out that the error comes from this line in simpleaf, caused by the the internal mtx to h5ad conversion (this line) where pola-rs encountered an empty It is strange because this file, generated by alevin-fry in this line, should at least have its header. The logic here is, Simpleaf first asks alevin-fry to generate this file, then read this file and add it into the h5ad output. So, this file should definitely be there. In some runs, this file was there but I got different md5sums of sorted h5ad files. This also doesn't make sense because, although the columns and rows can swap, the counts of a specific gene in a specific cell should be consistent. @rob-p - Do you have any idea what happened here? |
Super. I do not think I have any other comment, besides the one in the nf-tests which would be good to have but not necessary. |
Reopen #361 after updating simpleaf central modules. See this PR. I have tested using a 10x 500 dataset. Once the modules' PR is merged, we can start merging this PR
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).